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1.  Objectives  and  status 

The  objective  of  this  program  has  been  to  develop  optoelectronic  technologies,  components,  and  sub¬ 
systems  for  the  implementation  of  large-scale  learning  neural  networits  that  can  be  cMistruct^  of  cranpact 
(^toelectrorric  multi-chip  modules.  Neural  networks  belong  to  a  distinctive  class  of  computing  paradigms 
in  that  they  naturally  exhibit  parallelism  and  alleviate  the  control  task  by  learning  from  examples.  Our  goal 
is  to  demonstrate  these  key  features  in  a  hardware  implementation. 

During  this  reporting  period,  we  ccMiducted  research  in  three  directions  to  make  progress  towards  that  goal: 

•  Refinement  of  a  the  Fuzzy  ARTMAP  neural  network  algorithm.  The  algorithm  is  very  weU  suited  to 
hardware  implementatiai,  but  previously  did  not  perform  well  on  most  real  problems. 

•  Design,  fabrication,  and  chararterizatiai  of  birefringent  canputer  generated  holograms  to  implement 
the  OTIS  optical  system  for  reflective  modulators.  This  optical  system  provides  a  ^-shuffle 
interconnection  pattern  that  may  be  used  to  implement  a  backpropagation  neural  network. 

•  Devdqjmertt  of  a  simple,  compact,  efficient  optical  system  (OTIS-VE)  compatible  with  surface- 
normal  transmitters  (such  as  VCSELs)  implementing  the  ^-shuffle  interconnection  pattern. 


2.  Fuzzy  ARTMAP  modifications  for  intersecting  class  distributions 

The  Fuzzy  ARTMAP  neural  networic  performs  a  supervised  clustering  operatioa  It  may  be  applied  to 
vector  quantization  or  classification  of  data,  irtcluding  speech,  images,  and  radar  range  profiles.  The 
algorithm  is  very  well  suited  to  implementation  in  parallel  hardware*.  However,  as  currently  defined,  the 
algorithm  performs  poorly  due  to  cluster  proliferation  when  the  class  distributions  of  the  training  data 
intersect,  as  commonly  occurs  with  real-world  data.  We  propose  several  modifications  to  the  algorithm 
which  eliminate  the  cluster  proliferation  problem.  The  performance  of  the  original  and  modified  algorithms 
is  illustrated  using  the  Peterson-Bamey  vowel  data^. 

2.1.  Cluster  proliferation 

Fuzzy  art’  is  a  clustering  algorithm  that  (grates  «i  veaors  with  analog-valued  elements.  Adding  a 
further  layer  of  processing  (the  “MAP  field”)  to  Fuzzy  ART  yields  a  supervised  clustering  algorithm. 
Fuzzy  ARTM/j**.  During  learning.  Fuzzy  ARTMAP  is  presented  with  training  vectors  which  have  been 
labeled  according  to  the  class  to  which  each  belongs.  For  example,  vectors  of  features  extracted  from  short 
segments  of  speech  might  be  labeled  according  to  the  phoneme  being  spoken  during  each  segment.  The 
Fuzzy  ART  module  assigns  the  current  training  vector  to  a  cluster.  The  MAP  field  associates  a  class  with 
each  cluster.  If  the  class  label  of  the  training  vector  differs  from  the  class  associated  with  the  chosen 
cluster,  the  MAP  field  causes  the  Fuzzy  ART  search  to  be  repeated,  this  time  with  attention  restricted  to 
smaller  clusters.  If  this  procedure,  termed  match  tracking,  successfully  assigns  the  training  vector  to  a 
cluster  with  the  correct  class  association,  the  weights  defining  that  cluster  are  updated,  making  it  more 
likely  to  capture  that  training  vector  in  the  future.  If  no  cluster  with  the  correct  class  association  is  found,  a 
new  cluster  is  created. 

OassificatiOT  of  points  firan  intersecting  class  distribution  has  Iraig  been  identified  as  a  more  difficdt 
problem  than  classification  of  pdnts  from  separable  distributions’.  It  is  not  possible  to  classify  a  point 
lying  in  the  region  of  overlap  with  100%  certainty.  The  best  a  system  can  do  is  estimate  the  probability 
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that  a  training  prant  belongs  to  a  particular  class,  or  equivalentty  determine  a  fuzzy  value  of  the  class 
membership  of  the  data  point^  The  output  of  Fuzzy  ARTMAP  is  restricted  to  a  single  class  label.  In  this 
case,  the  best  generic  approach  is  to  select  the  class  from  which  ^  data  point  was  most  probably  drawn. 
Intersecting  distributions  are  canmrai  in  real  problems  due  to  noise,  classificaticm  uncertainty  and  errors, 
use  of  sub-optimal  features,  loss  of  cOTtext,  etc.  An  example  of  such  a  distributitMi  occurring  with  real- 
world  data  is  shown  in  Rgure  1. 

When  clouds  of  data  points  associated  with  several  classes  overlap.  Fuzzy  ARTMAP  tends  to  create  about 
one  cluster  per  data  pdnt.  We  refer  to  this  phenomenon  as  cluster  proliferatioa  This  is  a  direct  result  of 
the  fact  that  the  classes  are  not  separable  and  that  Fuzzy  ARTMAP  will  create  a  new  cluster  whenever  the 
training  point  can  not  be  assigned  to  the  correct  class  by  match  tracking.  While  this  resets  in  nearly 
perfect  classificatiMi  of  the  training  vectors,  performance  ai  the  test  data  (i.e.  generalizaticMi)  is  often  worse 
than  if  fewer  clusters  had  formed.  Furthermore,  the  time  required  for  the  Fuzzy  ART  search  and  the 
memory  required  to  store  the  weigjits  are  both  prc^iorticMial  to  the  number  of  clusters,  providing  additional 
incentives  to  limit  the  number  of  clusters  created. 


Figure  1  The  ten  American  vowels,  spoken  by  76  speakers.  Each  utterance  is  plotted  as  a  jxjint  in  a  two 
dimensional  space  defined  by  the  first  and  secaid  formant  frequencies.  The  class  distributions  are  strongly 
overlapping. 


2.2.  Standard  Fuzzy  ARTMAP  algorithm 

Fuzzy  ART  clusters  vectors  based  chi  two  separate  distance  criteria,  match  and  choice.  For  input  vector  7 
and  cluster  /.  the  match  function  is  defined  by 


1^1 
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where  w,  is  an  analog-valued  weight  vector  associated  with  cluster  j,  a  denotes  the  fuzzy  AND  operator, 
(p  A  =Tnin(p.y  (i.)j  and  the  norm  I— I  is  defined  by  ^  choice  functicHi  is  defined  by 

tit  ' 


Tjil) 


I  A 

a  + 

(1) 


where  a  is  a  small  constant.  Increasing  a  biases  the  search  more  toward  clusters  with  large  IwJ.  Fuzzy 
ART  assigns  each  input  vector  to  the  category  that  maximizes  Tp)  while  satisfying  Sp)  >p  (the  match 

criterion),  where  the  vigilance,  p,  is  in  the  range  0  1.  Unused  clusters  (“clusters”  associated  with 

unused  weight  vectors)  always  satisfy  the  match  criterion  and  have  a  constant  choice  value, 


The  MAP  field  may  be  aivisiaied  as  a  set  of  processing  elements  (PEs),  each  with  weights,  where 
N  is  the  number  of  clusters  formed  by  the  Fuzzy  ART  module  and  N  is  the  total  number  of  classes 
(categories).  It  is  basically  used  as  an  by  look-up  table.  During  training,  if  cluster  J  wins  the 
Fuzzy  ART  competitiwi  and  MAP  field  entry  (7,  class  J)  of  the  table  is  not  greater  than  some  threshold,  ff’’ , 
p  is  increased  and  the  Fuzzy  ART  search  is  repeated.  If  MAP  field  oitry  (J,  class j)  is  greater  than  p^  ,  the 
data  point  is  considered  to  have  been  correctly  classified.  Typically,  one  weight  of  each  MAP  field  PE 
equals  1  and  the  others  equal  0,  associating  a  single  class  with  each  cluster,  and  p^  =  0. 


Once  the  training  vector  has  been  assigned  to  the  correct  class,  the  weights  of  the  winning  cluster  are 


updated  by 


new  ^ 
WJi  - 


WJi  ^  li 
WJi  >  li 


where 0  <  1. 


Creating  a  new  cluster:  If  the  training  veaor  is  assigned  to  an  unused  cluster,  the  weight  vector  of  that 
cluster  is  set  equal  to  the  training  vecUM"  and  the  associated  MAP  field  weights  are  adjusted  to  reflect  the 
training  vector’s  class. 


23.  Modifications  to  Fuzzy  ARTMAP 
2J.1.  Pruning 

If  those  clusters  capturing  the  fewest  training  vectors  are  periodically  removed,  the  number  of  clusters  in 
use  approaches  a  steady  state  value.  Clusters  that  are  not  removed  can  grow  to  capture  training  vectors 
previously  in  the  pruned  clusters.  If  clusters  capturing  less  than  training  vectors  are  periodically 

removed,  the  absolute  maximum  number  of  remaining  clusters  is  where  is  the  total 

number  of  training  data  points.  In  practice,  the  number  of  clusters  formed  is  usually  significantly  smaller 
than  this. 

A  simple  way  to  keep  track  of  how  many  training  points  are  captured  by  each  cluster  is  to  change  slightly 
how  the  MAP  field  weights  are  updated.  Instead  of  setting  a  weight  to  1  when  a  new  cluster  is  formed,  the 
weight  is  incremented  by  some  ctxistant,  5^,  every  time  a  training  vector  is  assigned  to  that  cluster.  The 
size  of  the  weight  now  indicates  how  many  training  vectors  the  associated  cluster  has  captured.  In  order  to 
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prevent  the  MAP  field  wei^  from  growing  beyond  bounds,  a  time  decay  term  is  added.  We  implement 
this  by  dividing  all  MAP  field  weights  by  2  at  the  end  of  every  iteration  throu^  the  training  data. 

23.2.  Restricted  creation  of  new  clusters 

The  rate  of  creatiCTi  of  new  clusters  may  be  limited.  For  example,  if  the  same  data  set  is  presented 
repeatedly  during  training,  the  number  of  clusters  that  is  created  Airing  each  iteration  may  be  limited  to 
some  small  value,  for  example  the  total  number  of  classes.  This  causes  existing  clusters  to  grow  as  much 
as  possible,  generally  reducing  the  total  number  of  clusters  formed. 

2,3.3.  Choice  function  favoring  larger  clusters 

As  described  by  Carpenter,  et  al.^  the  Fuzzy  ART  weights  associated  with  each  cluster  may  be  envisioned 
as  two  points  defining  a  hyper-rectangle  in  A/-dimensiOTal  space,  where  is  the  number  of  axnponents  in 
each  data  vector  (before  complement  coding).  As  a  result  of  the  original  Weber  law  choice  function 
(eq.  1),  if  a  data  point  falls  inside  both  of  two  overlapping  rectangles,  tte  smaller  rectangle  (corresponding 
to  the  larger  IwJ)  wins.  This  guarantees  that  a  cluster  created  in  response  to  some  training  vector  captures 

the  training  vector  if  it  is  immediately  presented  again  (unless  another  rectangle  with  zero  area  already 
exists  at  the  same  positicxi,  as  may  happen  when  the  training  set  contains  two  identical  vectors  with 
Afferent  class  associations).  Furthermore,  if  a  data  point  falls  midway  between  two  weight  rectangle,  the 
smaller  one  wins.  While  it  is  necessary  to  preserve  the  former  property,  the  latter  one  has  undesirable 
consequences.  In  particular,  a  new  cluster  will  capture  data  points  from  a  large  volume  of  feature  space 
previously  captured  by  a  nearby  large  cluster.  Intuitively,  a  small  weight  rectangle  results  fix)m  a  very 
localized  set  of  training  points,  whereas  a  large  weight  rectangle  results  from  a  spread  out  set  of  training 
points.  If  a  test  point  lies  equidistant  from  the  two  rectangles,  it  lies  more  standard  deviations  away  from 
the  smaller  rectangle  than  from  the  larger  one,  and  therefore  should  be  assigned  to  the  latter.  The  following 
alternate  choice  functicHi  provides  the  desired  classification; 

T^{1)  =  +  [|/  A  w\  - \wj\]/a) 

As  in  the  Weber  law  choice  functiOT,  a  is  a  small  constant.  Here,  a  determines  the  range  at  which  a  larger 
weight  rectangle  is  favored  over  a  smaller  one.  (/)  may  be  less  than  0.  Note  that  the  quantity  \w\  need 

only  be  computed  (and  stored  locally)  when  the  weights  of  cluster;  are  updated. 
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Figure  2  Win  regions  of  two  clusters,  one  with  a  large  weight  rectangle  and  one  with  a  small  weight 
rectangle,  when  the  Weber  law  choice  functirai  is  used  (left)  and  when  the  alternate  choice  function 
favoring  large  weight  rectangles  is  used  (right). 

2  J.4.  Bi-directional  weight  movement 

The  Fuzzy  ART  learning  law  assures  that  weight  rectangles  only  grow  with  time  and  never  become 
smaller.  Consequently,  after  enough  iterations  through  the  training  data,  every  training  point  will  lie  either 
within  or  on  the  boundary  of  the  weight  rectangle  of  the  cluster  which  captures  it.  This  tends  to  create 
rectilinear  win  region  boundaries  throughout  the  feature  space,  which  is  often  not  desirable.  In  each 
dimension  of  the  feature  space,  if  the  closer  of  the  two  points  defining  the  weight  rectangle  is  moved  toward 
the  training  dau  point  regardless  of  whether  the  point  lies  inside  or  outside  the  rectangle,  the  weight 
rectangle  can  grow  and  shrink.  Thus  the  cluster  boundaries  can  better  approximate  the  shapes  of  the 
Bayesian  decision  boundaries  between  intersecting  class  distributions.  Note  that  without  pruning  or  limited 
rate  of  creation  of  new  clusters,  this  method  often  leads  to  the  creation  of  more  clusters.  One  of  the 
methods  for  limiting  the  number  of  clusters  formed  must  generally  be  used  for  bi-directional  weight 
movements  to  be  beneficial. 

23^.  Cluster  relabeling 

If  the  match  function  S/I)  is  close  to  /  (implying  that  the  training  point  fell  close  to  a  very  small  weight 

rectangle),  and  yet  the  classes  associated  with  cluster  J  and  training  vector  I  differ,  the  standard  Fuzzy 
ARTMAP  creates  a  new  cluster.  Since  tiling  even  a  small  volume  of  the  feature  space  with  very  small 
rectangles  requires  a  large  number  of  clusters,  this  is  generally  undesirable.  Instead,  it  may  be  desirable  to 
increment  the  MAP  field  entry  (7,  class,)  if  S/I)  >p„^, .  where  is  a  threshold  constant  close  to  7 . 
This  leads  to  two  ncm-zero  entries  in  the  MAP  field,  associating  one  cluster  with  two  classes.  When  a  test 
vector  is  captured  by  such  a  cluster,  it  is  possible  to  assign  a  the  vector  a  fuzzy  membership  in  each  of  the 
classes,  or  to  output  a  single  class  label  based  on  the  relative  magnitudes  of  the  two  MAP  field  entries 
(consistent  with  section  2.3.1).  If  cluster  relabeling  is  utilized,  it  is  important  to  also  choose  the  choice 
functicMi  favoring  larger  weight  rectangles  (section  2.3.3)  to  prevent  clusters  associated  with  multiple 
classes  from  capturing  points  which  can  be  unambiguously  assigned  to  a  single  class. 
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2.4.  Results 


We  have  tested  the  above  variants  on  several  data  sets,  including  artificial  data,  formant  frequencies  of 
steady  state  vowels  (2  to  4  dimensional  vectors),  and  cq)stral  coefBciaits  extracted  frcxn  coitinuous  speech 
(10  to  20  dimensiaial  vectors).  On  each  data  set,  the  best  results  were  achieved  using  a  ccxnbinati(xi  of  the 
proposed  algorithm  modifications.  However,  no  combination  of  variants  and  parameter  values  yielded 
particularly  good  performance  on  all  data  sets.  One  cause  of  this  is  that  the  class  distributions  vary  widely 
between  data  sets.  Another  is  the  varying  dimensionality  of  the  feature  spaces  —  a  fixed  number  of  clusters 
will  tile  a  twenty  dimensitxial  hypercube  very  differently  than  a  two  dimensional  square. 

Table  1  and  Figure  3  illustrate  the  performance  of  the  Fuzzy  ARTMAP  variants  on  the  Peterson-Bamey 
vowel  data  in  two  dimaisions.  SirKe  several  identical  training  data  values  are  labeled  with  different 
classes,  the  number  of  clusters  generated  by  the  standard  algorithm  grows  without  bounds.  The  parameter 
set  yielding  the  best  classificatirai  generated  620  clusters  after  20  iteratirais  through  the  training  set.  The 
smallest  number  of  clusters  generated  by  the  standard  algorithm  after  one  pass  through  the  training  data 
was  104,  giving  57%  correct  classificatirai  of  the  test  set.  In  amtrast,  the  modified  algorithm  achieved  up 
to  72%  correct  classification  with  105  clusters  and  74%  correct  with  141  clusters. 
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71 

0.01 

0.1 

5 

- 

0.5 

0.99 

Alternate 

87 

69 

Standar 

d  Fuzz 

V  ARTMAP 

0.01 

0.05 

20 

• 

liH 

• 

Weber 

620 

71 

0.01 

0.05 

5 

. 

ES 

- 

Weber 

450 

69 

0.01 

1.0 

5 

. 

0.5 

Weber 

243 

68 

0.01 

1.0  i 

2 

m 

- 

Weber 

181 

66 

0.01  ! 

0.25  1 

5 

. 

inti 

- 

Weber 

247 

65 

0.01 

1.0 

1 

- 

0.5 

- 

Weber 

104 

57 

Table  1  Performance  of  Fuzzy  ARTMAP  on  the  Peterson- Vowel  data  (formant  frequencies  ft  and  f2).  The 
parameter  values  shown  here  yielded  the  highest  performance.  Results  of  numerous  simulations  yielding 
equivalent  or  worse  performance  are  not  included.  The  bold  parameters  were  used  to  generate  the  plots  in 
Figure  3. 
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Figure  3  Map  of  feature  space  generated  by  the  standard  Fuzzy  ARTMAP  algorithm,  left,  and  by  the 
modifted  algorithm  using  bi-directional  weight  movement,  limited  cluster  creation,  pruning,  and  the 
alternate  choice  function,  right.  The  numerous  small  clusters  in  the  left  plot  indicate  overtraining  by  the 
standard  algorithm.  Both  variants  were  trained  cm  1020  points  randomly  chosen  from  the  Peterson-Bamey 
vowel  data  plotted  in  Figure  1. 

Note  that  this  is  a  very  (fifficult  classificaticm  task  -  for  comparison,  the  best  classification  accuracy  we 
were  able  to  achieve  with  optimized-leaming-rate  LVQl’  initialized  by  k-means  clustering  was  70.2% 
correct.  While  the  classification  performance  of  LVQ  is  similar  to  that  of  the  modified  Fuzzy  ARTMAP, 
the  latter  provides  several  advantages.  Notably,  due  to  the  use  of  the  Li  norm.  Fuzzy  ARTMAP  does  not 
require  a  multiplication  c^raticm  per  input  vector  compcment.  Furthermore,  it  need  not  be  initialized  using 
an  unsupervised  clustering  algorithm,  facilitating  on-line  learning.  Finally,  because  the  Fuzzy  ART 
weights  directly  define  not  only  the  position,  but  also  the  extent  of  each  cluster,  simitar  classification 
accuracy  can  sometimes  be  achieved  with  fewer  clusters. 

We  presented  this  work  to  at  the  Worid  Congress  on  Neural  Networks  in  September,  1996. 

3.  Optical  transpose  interconnection  system 

Optical  interconnectiems  have  advantages  over  elcarical  interconnections  in  terms  of  speed,  power,  and 
density  for  high-speed  links  over  distances  on  the  order  of  centimeters*.  Here,  we  are  concerned  with  a 
particular  interomnectiem  pattern,  the  k-shuflle,  which  is  commonly  used  in  switching  applications  and  to 
interconnect  processors  in  parallel  computing  systems’.  Section  3.1  shows  how  to  map  the 
backpropagatiem  neural  network  onto  this  interconnection  pattern.  The  optical  transpose  interconnectitm 
system*®  (OTIS)  architecture  provides  the  it-shuffie  using  tmly  a  pair  of  lenslet  arrays.  We  designed, 
fabricated,  and  tested  birefringent  computer  gaierated  holograms  (BCGH)  for  implementing  this  system 
with  reflective  modulators.  Furthermore,  we  developed  a  novel  optical  system  for  implementing  the  OTIS 
architecture  utilizing  surface  normal  transmitters.  We  carried  out  a  detailed  analysis  and  optimization  of 
the  latter  system  based  on  Gaussian  beam  propagation  theory. 
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The  /t'Shuffle  is  a  one-to-one  interconnection  between  L  iiq)ut  nodes  and  L  output  nodes,  where  L  is  the 
product  of  two  integers,  M  and  N.  The  input  iKxles  are  divided  into  M  groups  of  N  members,  and  the 
output  nodes  are  divided  into  N  groups  of  M  members.  The  ik-shuffle  cainects  iiput  node  (i,  j)  to  output 
node  (j,  i),  where  the  first  number  indexes  the  group  and  the  second  number  indexes  the  element  within  a 
group.  This  operation  may  also  be  viewed  as  the  transpositi«i  of  an  M\N  matrix. 

In  the  OTIS  system,  the  nodes  of  each  group  are  arranged  in  a  square  pattern  to  limit  the  length  of 
electrical  interccHinecticxis  within  the  group.  Similarly,  the  groups  are  arranged  in  a  square  pattern.  Thus, 
we  define  M  =  =  #,2X#,2  and  fV  =  #«2X#«2  =  #,/x#,/.  where  is  the  number  of  members  along 

each  edge  of  a  group  and  #*  is  the  number  of  groups  alraig  an  edge  of  the  processing  plane.  OTIS  uses  (me 
lenslet  per  group,  as  shown  in  cross-secticm  in  Rgure  4. 


Figure  4.  Optical  transpose  interctmnection  system  for  individually  directed  transmitters.  This  example  is 
drawn  for  8x8  groups  of  8x8  transmitters  (receivers)  per  group,  for  a  total  of  4096  channels. 

3.1.  Application  to  backpropagation 

Currently,  the  most  widely  used  neural  network  learning  algorithm  is  probably  backpropagation".  Here, 
we  demonstrate  how  the  OTIS  system  may  be  used  to  optically  implement  the  interconnections  between 
backpropagation  processing  units. 

Figure  5  shows  a  neural  architecture‘s  for  implementing  the  backpropagation  algorithm.  Each  layer 
consists  of  a  number  of  units.  In  order  to  (xmform  to  the  rerjuirement  that  a  neural  network  processing 
element  (PE)  have  only  a  single  output,  each  unit  is  subdivided  into  a  sun  and  multiple  planet  PEs.  One 
backpropagation  weight,  wm ,  is  stored  at  each  planet 

During  the  forward  pass,  the  output,  2;, ,  of  unit  i  of  layer  /  is  computed  as  the  scaled  sum  of  the  outputs  of 
the  previous  layer  multiplied  by  the  unit’s  corresponding  weights: 

Zi.  = 
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During  the  backward  pass,  a  set  of  error  signals  is  calculated.  Each  unit’s  contribution,  ,  to  the  global 
error  is  computed  as  the  scaled  sum  of  the  errors  of  the  next  layer’s  error  signals  times  the  weights 
corresponding  to  that  unit  in  the  next  layer 

5ii=  a'{l  li  )Zt  ^{M)ki^{M)k 

Each  weight  is  modified  in  proportiai  to  its  input  and  error  signals  in  order  to  reduce  the  global  error. 


Figure  5.  Two  units  of  a  backpropagation  neural  network. 

This  architecture  may  be  mapped  onto  the  OTIS  optical  system  by  (kvoting  one  OTIS  group  to  each 
backpropagation  unit  and  assigrang  a  single  transmitter/recdver  pair  to  each  planet.  The  interconnections 
within  a  unit  are  implemented  dectrically.  The  forward-pass  communication  between  layers  is 
implemented  by  electrically  fanning  out  each  sun’s  output  to  the  associated  planets  and  then  transmitting 
the  signals  to  the  next  layer  utilizing  the  one-to-one  optical  interconnections.  During  the  backward  pass, 
the  error  signals,  calculated  at  the  planet  PEs,  are  optically  transmitted  to  the  previous  layer  and  electrically 
summed  there.  Thus,  the  long  interconnections  are  implemented  optically,  whereas  the  short 
interconnections  arc  implemented  electrically. 

A  multi-layer  backpropagation  system  could  be  implemented  as  multiple  OTIS  systems  interfaced  by  beam 
splitters  or  as  a  single  OTIS  system  with  the  electronics  multiplexed  to  compute  the  signals  for  the  multiple 
layers  sequentially. 

3.2.  System  demonstration 

For  high  efficiency  and  power  uniformity,  the  optical  systan  shown  in  Figure  4  requires  individually 
directed  transmitters.  To  achieve  directed  signal  beams  in  a  system  utilizing  reflective  modulators  (Figure 
6),  an  additional  array  of  illumination  lenslets  is  required  (Figure  7).  However,  the  interconnect  lenslets 
and  illumination  lenslets  may  be  implemented  as  a  single  optical  element  if  birefringent  computer-generated 
hologram  (BCGH)  technology'^  is  utilized  We  have  designed  buUt,  and  characterized  such  holograms. 
One  of  the  BCGH  elements  is  shown  in  Figure  9.  The  modulator  input  spots  generated  by  this  element  are 
shown  in  Figure  1 1,  and  the  spots  produced  by  the  same  element  at  the  intermediate  image  plane  of  the 
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OTIS  interoxinectiai  system  are  shown  in  Figure  12.  Beam  scans  throu^  these  planes  are  shown  in 
Figure  13.  The  spot  size  measured  using  the  beam  scan  instrumoit  traxls  to  be  less  accurate  (larger)  than 
that  measured  by  imaging  the  spots  wito  a  CCD  camera  because  of  the  difficulty  in  aligning  the 
BeamScan’s  slit  with  a  row  of  spots. 


■Power 

Interconnections 


Solder 

Bumps 


Modulator 

Drivers, 

Detectors, 

Amplifiers, 

Processors 


Birefringent 
Diffractive  Lenslet 
Array  /  Illumination 
Grating  Arrayv 


Power/Interconnect 
Coupling  Element 


Modulator 

Drivers, 

Detectors, 

AmpliHers, 

Processors 


Optical  Power  2 
Modulators  1  Modulators  2 

Figure  6.  Schematic  diagram  of  an  optoelectronic  system  utilizing  reflective  modulators. 


OTIS  Icnskn 


Illumination  lensleus 


Figure  7.  niuminatitxi  lenslets  and  interconnect  lenslets  may  be  superimposed  in  the  same  plane  by 
utilizing  birefringent  computer-generated  holograms  (BCXjH). 


11 


GaAs:Si  CMOS  BCGH 


Plate 


BCGH 


GaAsrSi  CMOS 


V4 

Plate 


1cm 

- ► 

Figure  8.  OTIS  demonstratirai  system  schematic.  The  experimental  system  bi-directionally  interconnects 
two  planes,  each  COTsisting  of  an  8x4  modulator  array  and  an  8x4  detector  array.  The  system  length  is 
2.48  centimeters. 


Figure  9.  Birefringent  computer  generated  hologram  implementing  both  the  illumination  lenslets  and  the 
OnS  interconnection  lenslets.  The  minimum  feature  size  of  the  hologram  is  5pm. 
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Figure  10.  Optical  bench  set-up  used  to  characterize  the  OTIS  BCGH. 


Figure  1 1.  Modulatw  input  generated  by  BCGH  illumination  lenslets.  The  full  width  at  half-maximum 
(FWHM)  spot  diameters  in  the  modulator  plane  range  from  2Spm  to  37pm. 
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Figure  12.  Spots  in  the  intermediate  image  plane  (the  focal  plane  of  the  intercOTnect  elements)  of  the  OTIS 
system.  The  light  recorded  here  has  passed  through  the  BCGH  elements  twice,  experiencing  a  different 
lens  function  on  each  of  the  two  passes. 


Figure  13.  Beam  scans  of  spots  in  the  modulator  plane  Geft)  and  intermediate  image  plane  (right)  of  the 
OTIS  system. 


3.3.  Optical  transpose  interconnection  system  for  vertical  emitters  (OTIS-VE) 

We  dcvel(^)ed  a  novel  c^tical  system  design  for  implementing  the  OTIS  architecture  utilizing  surface 
normal  transmitters.  This  eliminates  the  need  for  BCGH  (described  in  the  previous  section)  to  implement  a 
reflective  modulator-based  system,  and  facilitates  implementations  utilizing  vertical  cavity  lasers 
(VCSELs).  The  drawback  of  the  newer  design  is  that  the  system  length  is  in  some  cases  longer  than  that  of 
the  original  design.  Which  system  is  most  suitable  for  a  given  applicaticMi  wfll  depend  on  the  type  of 
transmitters  used,  the  desired  spacing  between  transmitters  and  between  groups,  available  packaging 
technology,  and  the  relative  penalties  for  crosstalk,  optical  losses,  and  system  size. 

In  the  OTIS  system,  each  lens  axis  is  generally  not  centered  over  a  group.  However,  if  surface-normal 
transmitters  are  to  be  used,  each  lens  aperture  should  be  centered  over  the  corresponding  group.  Thus,  the 
lens  functions  are  asymmetrically  crisped  (Figure  14).  The  parallel  beams  from  a  group  of  transmitters 
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cross  at  the  rear  focal  pdnt  of  the  lens  corresponding  to  that  group.  Each  beam  then  propagates  through 
the  focal  point  of  a  lens  in  the  secwid  lenslet  plane.  In  addition  to  focusing  the  beam,  the  sectMid  lenslet 
deflect  the  beam  to  again  be  parallel  to  die  system  axis. 

The  optical  system  is  bi-directicaiaL  Furthermore,  if  #*7=^*2  and  #m7=^m2»  3  folded  system  may  be 
cxHistructed  using  a  single  lenslet  array  and  a  mirror  to  interconnea  nodes  within  a  single  processing  plane. 


Figure  14.  Optical  transpose  intercainection  system  for  vertical  emitters  (OTIS-VE).  Two  planes  of 
lenslets  rearrange  the  data  such  that  each  group  in  the  left  plane  is  connected  to  all  groups  in  the  right  plane 
and  visa  versa.  The  system  is  bi-directional. 


3.4.  Gaussian  beam  analysis 


We  carried  out  a  Gaussian  beam  analysis’^  in  order  to  determine  the  feasibility  of  using  current  refractive 
or  diffractive  (CGH)  optics  technology  to  build  the  system  described  above. 


The  basic  equations  governing  the  propagation  of  Gaussian  beams  are 


©^(z)  =  ©0 


.  ^oj 


R(z)  =  z\l+-^ 


propagation  of  a  beam  in  free  space: 

_  ncol 
°  X 


and  propagation  through  a  thin  lens: 


Ri~  R,  f' 


as  illustrated  in  Figure  15  and  Figure  16. 


Figure  15.  Gaussian  beam  propagation  in  free  space. 


kX 
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Figure  1 6.  Effect  of  a  thin  lens. 

The  geometry  of  the  interconnectian  (similar  triangles  in  Figure  14)  leads  to  the  relationship 

dtt  =  (4, 2/^1/  ^l)fl  =  +-^)/2  • 

Furthermore,  the  COTStraint  is  imposed  that  nraie  of  the  beams  may  be  clipped  (by  a  lens  aperture)  at  a 
radius  less  than  k  times  the  radius  of  the  e-2  irradiance  contour.  A  value  of  ^2.12  assures  that  the  beams 
remain  Gaussian'*  and  correspcmds  to  clipping  less  than  0.1%  of  the  optical  power.  The  maximum 
acceptable  radius  of  the  e-2  irradiance  caitour  at  the  detector  is  specified  as  a)d,f  Finally,  the  technology 
parameter  indicates  the  smallest  pennissible dumber  of  the  lenslets. 

These  equatiCHis  lead  to  a  constrained  search  of  the  five-dimoisional  parameter  space  Ag2,fi,  dpu,  dpc, 
the  objective  being  to  find  the  shortest  length  system  that  satisfies  the  above  requirements.  Since  the 
systan  is  bi-directiOTal,  the  clipping  ccxistraints  are  checked  at  both  loislet  planes  for  beams  propagating  in 
both  directions.  Typical  plots  of  minimum  system  length  vs.  iiput  beam  radius  are  shown  in  Figure  17  for 
a  symmetrical  system  consisting  of  64x64  transmitters  in  each  plane.  This  analysis  shows  that  the  system 
is  buildable  based  on  reasonable  values  of  the  technology  parameters. 


•»  0  (^‘m) 


Figure  17.  Length  of  a  system  with  64x64  transmitters  in  each  plane  for  various  values  of  the  input  beam 
waist,  tUb,  and  the  minimum  achievable  f/number,  Fmu>- 

The  plots  show  that  it  is  possible  to  simultaneously  achieve  de-magnification  of  beams  propagating  in  both 
directions.  While  the  optical  system  is  symmetrical,  the  beams  are  not  -  a  beam  waist  occurs  at  the  input 
plane  but  not  necessarily  at  the  output  plane. 

For  small  values  of  coo  the  dumber  constraint  causes  the  system  to  be  large  or  not  feasible.  For  example, 
for  a  wavelength  of  A^.85pm,  far  from  the  beam  waist,  an  fll  lens  can  just  capture  a  beam  with  609=2.3 
without  cropping  it  inside  the  ib=2.12  contour.  It  can  not  capture  several  such  beams  that  are  adjacent  to 
CMie  another  without  excessively  clipping  some  of  them.  For  large  values  of  CO5,  the  input  beams  are  nearly 
collimated.  In  this  case,  their  intermediate  beam  waists  occur  near  the  rear  focal  points  of  the  first  lenslets. 
CcHisequently,  the  beams  expand  significantly  before  reaching  the  second  lenslet  plane.  Thus,  the  second 
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lenslet  apertures  must  be  large,  again  resulting  in  a  large  system.  The  combinaticm  of  these  two  effects 
iparig  to  an  q)timal  input  beam  size  that  results  in  the  smallest  optical  system. 

In  summary,  the  theoretical  q)tical  efficiency  of  the  novel  OTIS  systan  for  surface-normal  transmitters  is 
nearly  100%.  The  Gaussian  beam  analysis  shows  that  fabricaticwi  of  the  system  is  feasible  in  today’s 
technology. 

We  have  submitted  a  paper  describing  this  work  to  the  OSA  topical  meeting  (mi  Optics  in  Computing,  to  be 
held  in  March,  1997. 

4.  Major  accomplishments 

•  Refinement  of  a  the  Fuzzy  ARTMAP  neural  network  algorithm.  The  algorithm  is  very  well  suited  to 
hardware  implementati(xi,  but  previously  did  not  perform  well  on  most  real  problems. 

•  Design,  fabrication,  and  characterization  of  birefringent  computer  generated  holograms  to  implement 
the  OTIS  optical  system  for  reflective  modulators.  This  (^tical  system  provides  a  k-shuffle 
interconnection  pattern  that  may  be  used  to  implement  a  backpropagation  neural  networic. 

•  Development  of  a  simple,  compact,  efficient  optical  system  (OTIS-VE)  compatible  with  surface- 
normal  transmitters  (such  as  VCSELs)  implementing  the  k-shuffle  interconnection  pattern. 

5.  Future  work 

During  this  projea,  we  focused  our  efforts  on  the  development  of  hybrid  optoelectronic  neural  modules,  as 
well  as  on  defining  guidelines  as  where  to  use  optoelectronics  in  neural  computing  hardware.  Several 
additional  projects  are  currently  under  way  to  demonstrate  and  characterize  system  prototypes  utilizing  this 
technology. 
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